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This  paper  deals  with  the  problem  of  estimating  the  binomial  parameter  via  the 
nonparametric  empirical  Bayes  approach.  This  estimation  problem  has  some  surprising 
phenomenon  that  estimators  which  are  asymptotically  optimal  in  the  usual  empirical  Bayes 
sense  do  not  exist  (Robbins  (1956,  1964)).  However,  as  pointed  out  by  Liang  (1984)  and 
Gupta  and  Liang  (1986),  it  is  possible  to  construct  asymptotically  optimal  empirical  Bayes 
estimators  if  the  unknown  prior  is  symmetric  about  the  point  1/2.  In  this  paper,  assuming 
symmetric  priors  a  monotone  empirical  Bayes  estimator  is  constructed  by  using  the  isotonic 
regression  method.  This  estimator  is  asymptotically  optimal  in  the  usual  empirical  Bayes 
sense.  The  corresponding  rate  of  convergence  is  investigated  and  shown  to  be  at  least  of 
order  n*1 ,  where  n  is  the  number  of  past  observations  at  hand.  f\ 


Key  Words  and  Phrases:  Bayes  estimator,  empirical  Bayes,  asymptotically  optimal,  rate 
of  convergence,  isotonic  regression,  symmetric  prior. 
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1.  INTRODUCTION 


Consider  a  sequence  of  Bernoulli  process  consisting  of  N  trials.  Let  p  denote  the 
probability  of  success  for  each  trial  and  Y  stand  for  the  number  of  successes  among  the 
total  N  trials.  Then  Y  follows  a  binomial  distribution  with  probability  function  f(y\p)  = 
(y)py(l-p)N~y,y  =  0,  Suppo  se  that  the  parameter  p  is  a  realization  of  a  random 

variable  P  having  a  prior  distribution  G.  Thus,  under  the  squared  error  loss,  given  Y  =  y, 
the  Bayes  estimator  of  p  is  the  posterior  mean  of  P  given  by 


,  \  =  Jo  pf(x\p)dG(p )  =  wiv) 
V°  y  Jo  f(*\p)dG(p)  M V) 


(1.1) 


where  h(y)  =  J0‘p}(  1  -  p)N~ydG(p)  and  w(y )  =  fj  py+1(l  -  p)N  ydG(p).  Also,  fa(y)  = 
(^)h(y)  is  the  marginal  probability  function  of  Y.  The  minimum  Bayes  risk  is  r(G)  = 
r(G,<pG)  =  E[(va(Y)-Pn 


When  the  prior  distribution  G  is  unknown,  many  authors,  based  on  the  past  observa¬ 
tions,  treated  this  estimation  problem  via  the  empirical  Bayes  approach  of  Robbins  (1956, 
1964).  For  details,  the  reader  is  referred  to  Liang  and  Huang  (1988),  Vardeman  (1978) 
and  the  related  references.  However,  as  pointed  ou'  r  Robbins  (1956,  1964),  this  estima¬ 
tion  problem  has  some  surprising  phenomenon  that  estimators  which  are  asymptotically 
optimal  in  the  usual  empirical  Bayes  sense  do  not  exist.  This  is  due  to  the  fact  that 
the  function  w(y)  cannot  be  consistently  estimated  when  the  prior  distribution  G  is  com¬ 
pletely  unknown.  To  remedy  this  deficiency,  Robbins  (1956)  suggested  taking  one  more 
observations  at  each  stage  and  proposed  an  estimator  which  is  a&ymptotically  optimal  in  a 
modified  sense.  Gupta  and  Liang  (1989)  treated  this  estimation  problem  through  the  para¬ 
metric  empirical  Bayes  approach  assuming  the  prior  to  be  a  member  of  beta  distribution 
family  with  unknown  hyperparameters  and  then  using  the  past  observations  to  estimate 


2 


the  unknown  hyperparameters.  Liang  (1984)  and  Gupta  and  Liang  (1986)  have  pointed 
out  that  if  the  unknown  prior  is  symmetric  about  the  point  ^ ,  it  is  possible  to  construct 
asymptotically  optimal  empirical  Bayes  estimators  for  the  binomial  parameter  p.  However, 
no  estimators  were  proposed. 

In  this  paper,  we  deal  with  this  estimation  problem  through  the  nonparametric  em¬ 
pirical  Bayes  approach  assuming  symmetric  priors.  A  monotone  empirical  Bayes  estimator 
is  constructed  by  using  the  isotonic  regression  method.  This  estimator  is  asymptotically 
optimal  in  the  usual  empirical  Bayes  sense.  The  corresponding  rate  of  convergence  is  inves¬ 
tigated  and  shown  to  be  at  least  of  order  n-1  where  n  is  the  number  of  past  observations 
at  hand. 


2.  CONSTRUCTION  OF  EMPIRICAL  BAYES  ESTIMATORS 

For  each  j  =  1,2,...,  let  (Yj,  Pj )  be  a  pair  of  random  variables  where  Yj  is  observable 
but  Pj  is  not.  Conditional  on  Pj  =  Pj,Yj  has  a  binomial  probability  function  f(y\pj)  = 
(jf)Pj(l  “  Pi)N~V'  y  =  0,1,...,  N.  It  is  assumed  that  Pj,j  =  1,2,...,  are  independently 
distributed  with  common  unknown  prior  distribution  G.  Therefore,  Yj,j  =  1,2,...,  are 
iid  with  marginal  probability  function  fG(y)-  Let  Yn  =  (Fi,...,Yn)  denote  the  n  past 
observations  and  Vn+i  =  Y  the  current  random  observation.  In  the  empirical  Bayes 
estimation  case,  an  estimation  <pn  for  the  present  problem  is  a  function  based  on  a  sequence 
of  past  observations  Yn  and  the  present  observation  Y  =  y.  We  investigate  this  estimation 
problem  under  the  following  assumption. 

Assumption  A:  The  prior  distribution  G  is  symmetric  about  the  point  and  N  is  an 

even  number. 

Under  Assumption  A,  we  have  the  following  lemma  which  describes  the  relationship 
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between  u>(y)  and  h(y). 


Lemma  2.1.  Under  Assumption  A,  we  have 

a)  w  (f )  =  2*(t)- 

b)  w(x)  =  w(N  —  x  —  1)  for  x  =  0, 1, . . . ,  N  —  1. 

c)  w(x)  +  w(N  —  x)  --  h(x)  =  h(N  —  x),x  =  0, 1, . . . ,  N. 

d)  w(x)  +  w(x  +  1)  =  h(x  +  1),  x  =  0, 1, . . . ,  N  —  1. 


e)  <pa(x)  =  1  -  <pa(N  -  x), x  =  0, 1,. . . ,  JV,  and 

f)  u>(*)  =  *  E*  h(x  -  i)(-l)*  -  (-!)*-* h(%)  /2,x  =  f , . . .  ,N. 

i=0 

Proof:  Straight  computation. 


For  each  y  =  0, 1, . . . ,  N,  define 


/n(y)  =  /n(iV  -  y)  = 


2n  ^  hy>N~V)0^i) 
J=1 

i  A  MW 

;=i 


My)  =  /n(y)/ 


and 


if  S'  -  f  • 


I  to.(»)  =  ’£  A„(v  -  i)(-l)'  -  (-1  )»-?/.„  (f )  /2  if  W  >  y  >  & , 

(  Mv)  =  My)  -  ifn(JV  —  y)  if  0  <  y  <  -  1. 


Both  /in(y)  and  wn(y)  are  unbiased  estimators  of  h(y)  and  w(y),  respectively,  y  = 
0,1 , . . . , JV.  Thus,  it  is  intuitive  to  use  as  an  estimator  for  v?G(y)  =  However, 

this  naive  estimator  may  have  serious  deficiencies.  First,  hn(y )  may  be  equal  to  zero  and 
thus,  the  function  is  not  well  defined.  Second,  it  is  possible  that  the  value  of 

may  be  greater  than  1  or  less  than  0,  while  0  <  <fG{y)  <  1  for  all  y  =  0, 1, . . . ,  JV.  Hence, 
in  the  following,  we  seek  a  better  estimator.  The  following  lemma  states  the  monotone 
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properties  of  the  functions  <^>G(y),  h(y)  and  w(y).  These  properties  may  suggest  a  way  how 
to  construct  reasonable  empirical  Bayes  estimators. 


Lemma  2.2.  a)  For  any  prior  distribution  G,tpa(y)  is  an  increasing  function  of  y,y  = 
n  i  /V 

u,  1,  •  .  .  ,  -Y  . 


b)  Under  Assumption  A,  both  h(y )  and  w(y)  are  increasing  in  y  for  y  =  y , . . . ,  N. 


Based  on  the  monotone  properties  described  in  Lemma  2.2,  we  let  {^(x)}-^*  be 

X~~  2 

the  isotonic  regression  of  { hn(x)}N_£L  with  equal  weights  and  from  Lemma  2.1,  define 

X—  J 

wn(x)  =  E*  ~hn(x  -  i)(-l)‘  -  /2,  for  y  <  x  <  N.  Thus,  h„(x)  is 

t— o 

nondecreasing  in  x  for  y  <  x  <  N,  and  by  this  nondecreasing  property,  tun(x)  >  0  for 
y  <  x  <  N .  However,  wn(x)  may  still  not  possess  the  nondecreasing  property.  Thus,  we 
let  {iy*(i)}^  v  be  the  isotonic  regression  of  {u>n(x)}^^  with  equal  weights  and  from 
Lemma  2.1,  define  /i*(x)  =  tu*(x  —  1)  +  u>£(x)  for  jf  + 1  <  x  <  N  and  h*  (y)  =  2w*n  (y). 
By  the  nondecreasing  property  of  u>*(x),  h*(x)  is  nondecreasing  in  x  for  y  <  x  <  N.  Now, 
for  y  <  x  <  N,  define 


if  h*(x)  ±  0 

,.(r) 

i  *  K(x)  =  0. 

Since  <pn(z)  may  be  not  a  nondecreasing  function  of  x  for  y  <  x  <  N,  we  consider  the 
isotonic  regression  {^‘(xH^v  of  with  equal  weights.  Also,  for  0  <  x  < 

—  1,  define  <^>*(x)  =  1  —  y?* (iV  —  x).  Now  one  can  see  that  y?*(x)  xs  nondecreasing  in  x 
for  x  =  0, 1, . . . ,  N.  We  propose  using  <Pn(x)  as  an  estimator  of  <^G(x),  x  =  0, 1, . . . ,  N. 


Remark  2.1.  By  the  nondecreasing  property  of  «/*(x)  onx,x  =  |,...,  N,<pn(x)  >  |  for 
all  x  >  &  and  hence,  y>*(x)  >  \  for  x  >  •&.  Also,  /i*(x)  =  0  iff  w*(x)  =  0  iff  wn(y)  =  0 
for  all  <  y  <  x  iff  hn(y)  =  0  for  all  y  <  y  <  x  iff  hn(y )  =  0  for  all  y  =  N  -  x, . . . ,  x, 
where  x  >  ^r. 
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3.  ASYMPTOTIC  OPTIMALITY 


Let  rpn(y)  denote  an  empirical  Bayes  estimator  based  on  the  current  observation  y 
and  the  past  data  Yn  =  (Yi, . . . ,  Yn).  Let  r(G,  0n)  denote  the  conditional  Bayes  risk 
(conditional  on  Yn )  of  the  estimator  rpn  and  Er(G,rpn)  the  associated  overall  Bayes  risk 
where  the  expectation  E  is  taken  with  respect  to  Yn.  Since  r(G)  is  the  minimum  Bayes 
risk,  r(G,  rpn )  —  r(G)  >  0  and  therefore  Er(G,  ipn)  —  r(G)  >  0.  The  nonnegative  difference 
Er(G,ipn)  —  r(G)  is  often  used  as  a  measure  of  the  optimality  of  the  empirical  Bayes 
estimator  ipn. 

Definition  3.1.  A  sequence  of  empirical  Bayes  estimators  {t/>n}^Lj  is  said  to  be  asymptot¬ 
ically  optimal  in  E  at  least  of  order  /3n  relative  to  the  prior  distribution  G  if  Er(G,  xpn)  — 
r(G)  <  D (/?„)  where  {^n}^L.x  is  a  sequence  of  positive  numbers  such  that  lim  f3n  =  0. 

n— *■  oo 

The  usefulness  of  empirical  Bayes  estimators  in  practical  applications  clearly  depend 
on  the  convergence  rates  for  which  the  risks  of  the  successive  estimators  approach  the 
minimum  Bayes  risk.  In  the  following,  the  performance  of  the  proposed  empirical  Bayes 
estimators  {</?*}  is  evaluated  on  basis  of  the  rates  of  convergence  of  the  nonnegative  dif¬ 
ference  Er(G,  </?*  )  —  r(G).  Without  loss  of  generality,  we  assume  that  G(0)  <  |  to  exclude 
the  extreme  case.  In  the  following,  all  the  computations  are  made  under  Assumption  A. 

Lemma  3.1.  For  each  y  =  y  +  1, . . . ,  N,  suppose  that  h*(y)  >  0.  Then  for  0  <  t  < 
min(l  —  <fia(y),<pa{y)  - 

lv>n(y)  -¥>o(y)l  >  t=>  (N  +  2)  s  \hn{x) -h(x)\2  >[h(N/2)t}2  . 

Proof:  |v’n(y)-Vo(y)l  >  t  =>  V*n{y)-'PG{,y)  >  t  or  <Pn(y)-iPa(y)  <  -* •  By  the  definition 
of  y?*(y),tx;*(i),iDn(x),  hn(x),  Lemma  2.1  and  Theorem  2.1  of  Barlow,  et.  al.  (1972),  we 
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<Pn(y)  -<Po(y)  > t 

=>  9„(x)  —  9g (y)  >  t  for  some  N/2  +  1  <  x  <  y. 

=>  u>*(x)[l  -  ^o(y)  -  f)  -  ™n(x  -  l)[^c(y)  +  *]  >  0  for  some  iV/2  +  1  <  x  <  y 
=>  [m>*(x)  -  u>(x)][l  -  9o(y)  -  <]  -  iwn(x  -  1)  -  w(x  -  1  lively)  +  t)  >  h{N/2)t 
for  some  N/2  -f  1  <  x  <  y 

=>  [u>*(x)  —  uj(x)]  >  h(N/2)t  or  [ui*(x  —  1)  —  w(x  —  1)]  <  —h(N/2)t 

for  some  N/2  +  1  <  x  <  y 

=>■  sup  |ie*(x)  —  u;(x)|  >  h(N/2)t 
X-<z<N 

=►  S  K(x)  -  «;(x)|3  >  [/»W2)<]2 

x=f 

=►  S  |t5„(x)  -  u;(x)|2  >  [h(iV/2)t]2, 
x=f 

iV  TV 

since  S  [u>*(x)  —  u;(x)]2  <  £  (iin(x)  —  iu(x))2,  see  Theorem  2.1  of  Barlow,  et.  al.  (1972). 

I=Jr  *=¥ 

Now,  by  the  definition  of  wn(x),  we  have,  for  each  x  =  y , . . . ,  N, 

[w„(x)  -  w(x)j2 

fx-A.  2 

=  S’  [S»(*  -  i)  -  M*  -  *)](-!)'  -  (-l)I‘“[An(JV/2)  -  A(JV/2)J/2 

i=0 

<  2  E  [^n(x)  -  h(x)]2 

'=* 

<  2  s  [hn(x)  -  ^i(x)]2 

z=f 

where  the  last  inequality  is  again  from  Theorem  2.1  of  Barlow,  et.  al.  (1972). 

Based  on  the  above  discussions,  we  conclude  that 


9n(y)  -  v>.(»)  >  *  =>  (JV  +  2)  E  [M*)  -  Mi)]2  >  (MJV/2)*]2  ■ 


(3.1) 


Analogous  to  the  preceding  discussion,  under  the  assumption  that  /i*(y)  >  0,  we  can 


obtain: 

Vn(v)  ~  *c(y)  <  ~t  and  h*n(y )  >  0  =>  (.V  +  2)  £  [hn{z)  -  h(z)}2  >  [h(N/2)tf  (3.2) 

*=Jr 

Therefore,  (3.1)  and  (3.2)  together  lead  to  the  result  of  the  lemma. 

Remark  3.1.  Note  that  for  each  y  =  y  +  l,...,iV,  as  (  >  1  —  ~pG(y), 

{^n(y)  -  ^a(y)  >t}=4>\  also,  as  t  >  <pa(y)  -  {<p;(y)  -  <PG(y)  <  -t}  =  <t>. 

Lemma  3.2.  For  each  y  =  y  +  1, . . . ,  JV  and  t  >  0, 

P{l<Pn(l/)  ~  Vciy)!  >  *  and  hn(y)  >  0}  <  £  2e  (N+2) 

r  =  f 

Proof:  By  Remark  3.1,  P{\v*n(y)  -  <pG(y)l  >  ^K(y)  >  0}  =  0  if  t  >  max(l  -  <pG(y), 
<pG(y)  -  §)•  Thus,  as  0  <  t  <  max(l  -  <PG(y),  y?G(y)  -  §),  from  Lemma  3.1, 


p{\ Vn(y)  -  v>a(v)\  >  Kiv)  >  0} 

N  I 

£  P\ 
*•*  1 

N  r,  1 

£  P 
*-*  1 

N 

£  2e 

(* +  S)S 

where  the  last  inequality  is  obtained  from  Theorem  1  of  Hoeffding  (1963). 

The  following  theorem  is  our  main  result. 

Theorem  3.1.  Let  2  be  the  sequence  of  empirical  Bayes  estimators  constructed  in 

Section  2.  Then,  under  Assumption  A, 

Er(G,V‘n)-r(G)<0(n~'). 
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Proof:  Straightforward  computation  leads  to  the  following. 

0  <  Er(G ,<~p*n)  —  r(G) 

=  S  —  ir’a(y))2]fa(y) 

9=0 

=2  S  E[{v'n(y)-Sc(y))2}fc(y)- 

y=f  +  i 


For  each  y  =  y  +  1, . . . ,  N, 

E[(v*n(y)  -  ^G(y))2] 

=  /  2tP{|cp;(y)  -  <,pG(y)|  >  f,h;(y)  >  0}dt 

Jo 

+  (^{y)-l/2)2P{h‘M=0}. 

Now,  from  Remark  2.1, 

P{/C(y)  =  0}  =  P{fn(x)  =  0  for  all  x  =  N  -  y, . . . ,  y} 

=  [1  —  FG(y)  +  Eg(N  —  y  —  l)]n 

(3.4) 

=  exp(— n/n(l  -  Fc{ y)  4-  Fc(iV  -  y  -  l))-1) 

<  0(n~l). 

where  FG(-)  is  the  marginal  distribution  function  of  Y.  Also,  from  Lemma  3.2,  and  the 
fact  that  max(l  —  lt>G(y),iPG(y)  —  j  f°r  y  ^  ^  +  L  we  have 


|-max(  1  -<fia  (y)  ,<#>G  (y)  -  ^ ) 

J  2tP{|<(y)  -  v>G(y) I  >  <, <(y)  > 

r\  n 

<  /  4 1  E  e  <"+3>J  dt 

Jo  r=f 

<  I  g  (^  +  2)2 

-  "  .=*  2A’ (f )  02 
=  0("-‘). 

From  (3.4)  and  (3.5),  we  conclude  that  for  each  y  =  y  4- 1, . . . ,  N, 


2*P{K(y)  -  9o(y)  1  >  ^n(y)  >  0}«rt. 


e  <"+3>J  dt 


E[(^n(y)-^G(y))2}<0(n-1). 
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Since  N  is  finite  and  fixed,  (3.6)  and  (3.3)  together  complete  the  proof  of  the  theorem. 
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